Executive Summary

Bike Rentals are a growing industry but one that faces both weather-based and seasonal variations in demand. It is important to quantify how those those variations affect demand as demand, in turn, affects the number of employees needed as well as the number of bikes that need to be available. Significant mis-matches in bike availability or numbers of workers can result in missing rental opportunities (no bike available or insufficient employees for timely rental) or conversely, extra expenses to the rental company in terms of wages or bike purchases.

Using a publicly available dataset (described below), I attempt to answer two basic questions:

Q1: What are the factors that determine demand?

Q2: Do these factors vary by season?

The dataset consists of 17379 rental records collected over a two year period (2011, 2012). The records have been aggregated into 731 days for this analysis because each day is the determinant of the number bikes and employees needed.

Data attributes are both categorical and numerical and the statistical analyses vary accordingly.

The Data Source

The data are taken from the UCI Machine Learning Repository. The original data are reported and analyized in a paper by Fanaee, T and Gama, J.

Data Description

## [1] 731  16

As noted previously, the data consist of 731 day records with 15 variables per record (the 16th is a record number). Here is a list of the data attributes, including the variable name the type of data and a sample of the first 10 records for that attribute:

day <- read.csv('day.csv')
str(day)
## 'data.frame':    731 obs. of  16 variables:
##  $ instance  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ date      : Factor w/ 731 levels "1/1/11","1/1/12",..: 1 23 45 51 53 55 57 59 61 3 ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ year      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ month     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weekday   : int  6 0 1 2 3 4 5 6 0 1 ...
##  $ workingday: int  0 0 1 1 1 1 1 0 0 1 ...
##  $ conditions: int  2 2 1 1 1 1 2 2 1 1 ...
##  $ temp      : num  0.344 0.363 0.196 0.2 0.227 ...
##  $ felt_temp : num  0.364 0.354 0.189 0.212 0.229 ...
##  $ hum       : num  0.806 0.696 0.437 0.59 0.437 ...
##  $ windspeed : num  0.16 0.249 0.248 0.16 0.187 ...
##  $ casual    : int  331 131 120 108 82 88 148 68 54 41 ...
##  $ registered: int  654 670 1229 1454 1518 1518 1362 891 768 1280 ...
##  $ count     : int  985 801 1349 1562 1600 1606 1510 959 822 1321 ...
Description of Attributes
  1. Instances: Record id number
  2. date: Date (Factor w/ 731 levels “1/1/11”,“1/1/12”,…)
  3. season: Season (1:spring, 2:summer, 3:fall, 4:winter)
  4. yr: Year (0:2011, 1:2012)
  5. month: Month (1 to 12)
  6. holiday: Holiday? (0:no, 1:yes)
  7. weekday: Day of Week. Numbered 1-7, beginning with Sunday.
  8. workingday: WorkingDay? (0:no, 1:yes)
  9. conditions: Weather Conditions: (1: Clear or Partly cloudy, 2: Mist with Few Clouds up to Cloudy, 3: Light Snow or Light Rain with Scattered Clouds/Cloudy/Thunderstorm, 4: Heavy Rain, Hail or Thunderstorm or Mist or Snow + Fog)
  10. temp: Normalized temperature in Celsius. The values are divided to 41 (max)
  11. felt_temp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
  12. hum: Normalized humidity. The values are divided to 100 (max)
  13. windspeed: Normalized wind speed. The values are divided to 67 (max)
  14. casual: Count of casual users
  15. registered: Count of registered users
  16. count: Count of total bike rentals including both casual and registered
day <- read.csv('day.csv')
summary(day)
##     instance          date         season           year       
##  Min.   :  1.0   1/1/11 :  1   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:183.5   1/1/12 :  1   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :366.0   1/10/11:  1   Median :3.000   Median :1.0000  
##  Mean   :366.0   1/10/12:  1   Mean   :2.497   Mean   :0.5007  
##  3rd Qu.:548.5   1/11/11:  1   3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :731.0   1/11/12:  1   Max.   :4.000   Max.   :1.0000  
##                  (Other):725                                   
##      month          holiday           weekday        workingday   
##  Min.   : 1.00   Min.   :0.00000   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 4.00   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.000  
##  Median : 7.00   Median :0.00000   Median :3.000   Median :1.000  
##  Mean   : 6.52   Mean   :0.02873   Mean   :2.997   Mean   :0.684  
##  3rd Qu.:10.00   3rd Qu.:0.00000   3rd Qu.:5.000   3rd Qu.:1.000  
##  Max.   :12.00   Max.   :1.00000   Max.   :6.000   Max.   :1.000  
##                                                                   
##    conditions         temp           felt_temp            hum        
##  Min.   :1.000   Min.   :0.05913   Min.   :0.07907   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:0.33708   1st Qu.:0.33784   1st Qu.:0.5200  
##  Median :1.000   Median :0.49833   Median :0.48673   Median :0.6267  
##  Mean   :1.395   Mean   :0.49538   Mean   :0.47435   Mean   :0.6279  
##  3rd Qu.:2.000   3rd Qu.:0.65542   3rd Qu.:0.60860   3rd Qu.:0.7302  
##  Max.   :3.000   Max.   :0.86167   Max.   :0.84090   Max.   :0.9725  
##                                                                      
##    windspeed           casual         registered       count     
##  Min.   :0.02239   Min.   :   2.0   Min.   :  20   Min.   :  22  
##  1st Qu.:0.13495   1st Qu.: 315.5   1st Qu.:2497   1st Qu.:3152  
##  Median :0.18097   Median : 713.0   Median :3662   Median :4548  
##  Mean   :0.19049   Mean   : 848.2   Mean   :3656   Mean   :4504  
##  3rd Qu.:0.23321   3rd Qu.:1096.0   3rd Qu.:4776   3rd Qu.:5956  
##  Max.   :0.50746   Max.   :3410.0   Max.   :6946   Max.   :8714  
## 

Loading Data

hist(day$conditions)

Notice that there are no “4” values and very few “3” values (n= 21). We generally have sunny or partially cloudy days.

hist(day$temp)

hist(day$felt_temp)

hist(day$hum)

hist(day$windspeed)

hist(day$casual)

hist(day$registered)

hist(day$count)

Visual Data Exploration

plot(day$conditions, day$casual)

plot(day$conditions, day$registered)

plot(day$hum, day$casual)

plot(day$hum, day$registered)

plot(day$windspeed, day$casual)

plot(day$windspeed, day$registered)

plot(day$temp, day$casual)

plot(day$temp, day$registered)

plot(day$felt_temp, day$casual)

plot(day$felt_temp, day$registered)

plot(day$temp, day$felt_temp)

cor(day$temp, day$felt_temp)
## [1] 0.9917016
plot(day$temp, day$hum)

cor.test(day$temp, day$hum)
## 
##  Pearson's product-moment correlation
## 
## data:  day$temp and day$hum
## t = 3.456, df = 729, p-value = 0.0005801
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05495529 0.19765680
## sample estimates:
##       cor 
## 0.1269629
plot(day$temp, day$windspeed)

cor.test(day$temp, day$windspeed)
## 
##  Pearson's product-moment correlation
## 
## data:  day$temp and day$windspeed
## t = -4.3187, df = 729, p-value = 1.787e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2278482 -0.0864203
## sample estimates:
##        cor 
## -0.1579441
plot(day$hum, day$windspeed)

cor.test(day$hum, day$windspeed)
## 
##  Pearson's product-moment correlation
## 
## data:  day$hum and day$windspeed
## t = -6.9265, df = 729, p-value = 9.488e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3153210 -0.1792046
## sample estimates:
##        cor 
## -0.2484891
plot(day$conditions, day$windspeed)

cor.test(day$conditions, day$windspeed)
## 
##  Pearson's product-moment correlation
## 
## data:  day$conditions and day$windspeed
## t = 1.0676, df = 729, p-value = 0.286
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03309737  0.11170461
## sample estimates:
##        cor 
## 0.03951106
cor.test(day$conditions, day$windspeed, method='spearman')
## Warning in cor.test.default(day$conditions, day$windspeed, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  day$conditions and day$windspeed
## S = 64014000, p-value = 0.6517
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.01672523
plot(day$conditions, day$hum)

cor.test(day$conditions, day$hum, method ='spearman')
## Warning in cor.test.default(day$conditions, day$hum, method = "spearman"):
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  day$conditions and day$hum
## S = 26267000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.596532
plot(day$conditions, day$temp)

cor.test(day$conditions, day$temp)
## 
##  Pearson's product-moment correlation
## 
## data:  day$conditions and day$temp
## t = -3.2802, df = 729, p-value = 0.001087
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1914416 -0.0485129
## sample estimates:
##        cor 
## -0.1206022

Correlations and Linear Regressions (single variables)

NUMBER OF USERS vs. CONDITIONS

cor.test(day$conditions, day$casual)
## 
##  Pearson's product-moment correlation
## 
## data:  day$conditions and day$casual
## t = -6.8927, df = 729, p-value = 1.186e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3142304 -0.1780327
## sample estimates:
##       cor 
## -0.247353
cor.test(day$conditions, day$registered)
## 
##  Pearson's product-moment correlation
## 
## data:  day$conditions and day$registered
## t = -7.2817, df = 729, p-value = 8.566e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3267321 -0.1914898
## sample estimates:
##        cor 
## -0.2603877
casual_users = lm(day$casual ~ day$conditions)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$conditions)
## 
## Coefficients:
##    (Intercept)  day$conditions  
##         1283.1          -311.7
reg_users = lm(day$registered ~ day$conditions)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$conditions)
## 
## Coefficients:
##    (Intercept)  day$conditions  
##         4696.5          -745.6

NUMBER OF USERS vs. HUMIDITY

cor.test(day$hum, day$casual)
## 
##  Pearson's product-moment correlation
## 
## data:  day$hum and day$casual
## t = -2.0854, df = 729, p-value = 0.03738
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.148691172 -0.004519522
## sample estimates:
##         cor 
## -0.07700788
cor.test(day$hum, day$registered)
## 
##  Pearson's product-moment correlation
## 
## data:  day$hum and day$registered
## t = -2.4697, df = 729, p-value = 0.01375
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.16252867 -0.01869851
## sample estimates:
##        cor 
## -0.0910886
casual_users = lm(day$casual ~ day$hum)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$hum)
## 
## Coefficients:
## (Intercept)      day$hum  
##      1081.3       -371.2
reg_users = lm(day$registered ~ day$hum)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$hum)
## 
## Coefficients:
## (Intercept)      day$hum  
##      4282.7       -997.8

NUMBER OF USERS vs. WINDSPEED

cor.test(day$windspeed, day$casual)
## 
##  Pearson's product-moment correlation
## 
## data:  day$windspeed and day$casual
## t = -4.5905, df = 729, p-value = 5.207e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.23724343 -0.09626984
## sample estimates:
##        cor 
## -0.1676133
cor.test(day$windspeed, day$registered)
## 
##  Pearson's product-moment correlation
## 
## data:  day$windspeed and day$registered
## t = -6.0151, df = 729, p-value = 2.844e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2854614 -0.1472573
## sample estimates:
##       cor 
## -0.217449
casual_users = lm(day$casual ~ day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$windspeed)
## 
## Coefficients:
##   (Intercept)  day$windspeed  
##          1131          -1485
reg_users = lm(day$registered ~ day$windspeed)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$windspeed)
## 
## Coefficients:
##   (Intercept)  day$windspeed  
##          4490          -4378

NUMBER OF USERS vs. ACTUAL TEMPERATURE

cor.test(day$temp, day$casual)
## 
##  Pearson's product-moment correlation
## 
## data:  day$temp and day$casual
## t = 17.472, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4900779 0.5924581
## sample estimates:
##       cor 
## 0.5432847
cor.test(day$temp, day$registered)
## 
##  Pearson's product-moment correlation
## 
## data:  day$temp and day$registered
## t = 17.323, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4865508 0.5894440
## sample estimates:
##      cor 
## 0.540012
casual_users = lm(day$casual ~ day$temp)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$temp)
## 
## Coefficients:
## (Intercept)     day$temp  
##      -161.3       2037.9
reg_users = lm(day$registered ~ day$temp)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$temp)
## 
## Coefficients:
## (Intercept)     day$temp  
##        1376         4603

NUMBER OF USERS vs. FELT TEMPERATURE

cor.test(day$felt_temp, day$casual)
## 
##  Pearson's product-moment correlation
## 
## data:  day$felt_temp and day$casual
## t = 17.499, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4907022 0.5929912
## sample estimates:
##       cor 
## 0.5438637
cor.test(day$felt_temp, day$registered)
## 
##  Pearson's product-moment correlation
## 
## data:  day$felt_temp and day$registered
## t = 17.514, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4910559 0.5932932
## sample estimates:
##       cor 
## 0.5441918
casual_users = lm(day$casual ~ day$felt_temp)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$felt_temp)
## 
## Coefficients:
##   (Intercept)  day$felt_temp  
##        -238.8         2291.5
reg_users = lm(day$registered ~ day$felt_temp)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$felt_temp)
## 
## Coefficients:
##   (Intercept)  day$felt_temp  
##          1185           5210

Multivariate Exploration

Since all weather variables show significant effects on the number of riders, lets model the full set. However, because the actual temperature and the felt_temperature are virtually the same, we will use only actual temperatures.

Conditions only vs. Weather data -casual users

casual_users = lm(day$casual ~ day$conditions)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$conditions)
## 
## Coefficients:
##    (Intercept)  day$conditions  
##         1283.1          -311.7
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$conditions)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -956.4 -460.2 -152.4  231.9 2495.3 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1283.09      67.73  18.944  < 2e-16 ***
## day$conditions  -311.69      45.22  -6.893 1.19e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 665.7 on 729 degrees of freedom
## Multiple R-squared:  0.06118,    Adjusted R-squared:  0.0599 
## F-statistic: 47.51 on 1 and 729 DF,  p-value: 1.186e-11
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                 Df    Sum Sq  Mean Sq F value    Pr(>F)    
## day$conditions   1  21056843 21056843   47.51 1.186e-11 ***
## Residuals      729 323101979   443213                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

casual_users = lm(day$casual ~ day$temp + day$hum +day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##   (Intercept)       day$temp        day$hum  day$windspeed  
##         582.7         2048.0         -855.7        -1111.8
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$temp + day$hum + day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1113.5  -327.2  -156.2   145.2  2296.4 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      582.7      133.4   4.369 1.43e-05 ***
## day$temp        2048.0      115.7  17.703  < 2e-16 ***
## day$hum         -855.7      151.6  -5.646 2.36e-08 ***
## day$windspeed  -1111.8      279.8  -3.973 7.80e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 562.6 on 727 degrees of freedom
## Multiple R-squared:  0.3313, Adjusted R-squared:  0.3286 
## F-statistic: 120.1 on 3 and 727 DF,  p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                Df    Sum Sq   Mean Sq F value    Pr(>F)    
## day$temp        1 101581307 101581307 320.909 < 2.2e-16 ***
## day$hum         1   7454739   7454739  23.550 1.490e-06 ***
## day$windspeed   1   4996687   4996687  15.785 7.802e-05 ***
## Residuals     727 230126089    316542                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

Conditions only vs. Weather data -registered users

reg_users = lm(day$registered ~ day$conditions)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$conditions)
## 
## Coefficients:
##    (Intercept)  day$conditions  
##         4696.5          -745.6
summary(reg_users)
## 
## Call:
## lm(formula = day$registered ~ day$conditions)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3534.9 -1055.4   -25.9  1078.6  3638.7 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4696.5      153.4  30.623  < 2e-16 ***
## day$conditions   -745.6      102.4  -7.282 8.57e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1507 on 729 degrees of freedom
## Multiple R-squared:  0.0678, Adjusted R-squared:  0.06652 
## F-statistic: 53.02 on 1 and 729 DF,  p-value: 8.566e-13
anova(reg_users)
## Analysis of Variance Table
## 
## Response: day$registered
##                 Df     Sum Sq   Mean Sq F value    Pr(>F)    
## day$conditions   1  120491318 120491318  53.023 8.566e-13 ***
## Residuals      729 1656620654   2272456                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)

reg_users = lm(day$registered ~ day$temp + day$hum +day$windspeed)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##   (Intercept)       day$temp        day$hum  day$windspeed  
##          3502           4577          -2244          -3695
summary(reg_users)
## 
## Call:
## lm(formula = day$registered ~ day$temp + day$hum + day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3738.0  -982.8  -160.1   978.2  3165.6 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3501.7      299.1  11.706  < 2e-16 ***
## day$temp        4577.5      259.5  17.641  < 2e-16 ***
## day$hum        -2244.4      340.0  -6.602 7.84e-11 ***
## day$windspeed  -3695.1      627.6  -5.887 5.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1262 on 727 degrees of freedom
## Multiple R-squared:  0.3486, Adjusted R-squared:  0.3459 
## F-statistic: 129.7 on 3 and 727 DF,  p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
## 
## Response: day$registered
##                Df     Sum Sq   Mean Sq F value    Pr(>F)    
## day$temp        1  518228818 518228818 325.446 < 2.2e-16 ***
## day$hum         1   46037414  46037414  28.911 1.022e-07 ***
## day$windspeed   1   55195487  55195487  34.663 5.989e-09 ***
## Residuals     727 1157650253   1592366                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)

How well do weather measurements predict “conditions”?

conditions = lm(day$conditions ~ day$temp + day$hum + day$windspeed)
conditions
## 
## Call:
## lm(formula = day$conditions ~ day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##   (Intercept)       day$temp        day$hum  day$windspeed  
##       -0.1567        -0.5250         2.5131         1.2296
summary(conditions)
## 
## Call:
## lm(formula = day$conditions ~ day$temp + day$hum + day$windspeed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91615 -0.29110 -0.07017  0.26731  3.03902 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.15674    0.09887  -1.585    0.113    
## day$temp      -0.52504    0.08577  -6.121 1.52e-09 ***
## day$hum        2.51310    0.11237  22.364  < 2e-16 ***
## day$windspeed  1.22962    0.20746   5.927 4.76e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4171 on 727 degrees of freedom
## Multiple R-squared:  0.4164, Adjusted R-squared:  0.414 
## F-statistic: 172.9 on 3 and 727 DF,  p-value: < 2.2e-16
anova(conditions)
## Analysis of Variance Table
## 
## Response: day$conditions
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## day$temp        1   3.153   3.153   18.12 2.346e-05 ***
## day$hum         1  80.996  80.996  465.54 < 2.2e-16 ***
## day$windspeed   1   6.112   6.112   35.13 4.761e-09 ***
## Residuals     727 126.484   0.174                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(conditions)

Including Seasonality

casual_users = lm(day$casual ~ day$season)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$season)
## 
## Coefficients:
## (Intercept)   day$season  
##       523.5        130.1
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1041.7  -490.5  -151.5   260.9  2626.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   523.49      61.15   8.561  < 2e-16 ***
## day$season    130.05      22.38   5.811 9.29e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 671.7 on 729 degrees of freedom
## Multiple R-squared:  0.04427,    Adjusted R-squared:  0.04296 
## F-statistic: 33.77 on 1 and 729 DF,  p-value: 9.288e-09
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##             Df    Sum Sq  Mean Sq F value    Pr(>F)    
## day$season   1  15235157 15235157  33.766 9.288e-09 ***
## Residuals  729 328923665   451198                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

Season and Weather

casual_users = lm(day$casual ~ day$season + day$temp +day$hum +day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$season + day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##   (Intercept)     day$season       day$temp        day$hum  day$windspeed  
##        546.53          26.17        2001.35        -882.45       -1055.47
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$season + day$temp + day$hum + day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1108.2  -335.6  -151.6   148.3  2337.1 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     546.53     136.27   4.011 6.68e-05 ***
## day$season       26.17      20.44   1.281 0.200739    
## day$temp       2001.35     121.25  16.505  < 2e-16 ***
## day$hum        -882.45     152.94  -5.770 1.17e-08 ***
## day$windspeed -1055.47     283.14  -3.728 0.000208 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 562.4 on 726 degrees of freedom
## Multiple R-squared:  0.3328, Adjusted R-squared:  0.3292 
## F-statistic: 90.55 on 4 and 726 DF,  p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                Df    Sum Sq  Mean Sq F value    Pr(>F)    
## day$season      1  15235157 15235157  48.172 8.664e-12 ***
## day$temp        1  86666882 86666882 274.034 < 2.2e-16 ***
## day$hum         1   8254671  8254671  26.101 4.147e-07 ***
## day$windspeed   1   4394686  4394686  13.896 0.0002083 ***
## Residuals     726 229607427   316264                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

Month and Weather

casual_users = lm(day$casual ~ day$month + day$temp +day$hum +day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$month + day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##   (Intercept)      day$month       day$temp        day$hum  day$windspeed  
##       570.046          3.594       2036.031       -870.145      -1089.629
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$month + day$temp + day$hum + day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1110.4  -327.2  -152.5   147.0  2305.6 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     570.046    135.307   4.213 2.84e-05 ***
## day$month         3.594      6.378   0.563 0.573291    
## day$temp       2036.031    117.695  17.299  < 2e-16 ***
## day$hum        -870.145    153.784  -5.658 2.20e-08 ***
## day$windspeed -1089.629    282.710  -3.854 0.000126 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 562.9 on 726 degrees of freedom
## Multiple R-squared:  0.3316, Adjusted R-squared:  0.3279 
## F-statistic: 90.06 on 4 and 726 DF,  p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                Df    Sum Sq  Mean Sq F value    Pr(>F)    
## day$month       1   5207277  5207277  16.435 5.578e-05 ***
## day$temp        1  96378141 96378141 304.186 < 2.2e-16 ***
## day$hum         1   7841236  7841236  24.748 8.163e-07 ***
## day$windspeed   1   4706674  4706674  14.855 0.0001264 ***
## Residuals     726 230025494   316840                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

Days and Weather

plot(day$weekday, day$casual)

plot(day$weekday, day$registered)

casual_users = lm(day$casual ~ day$workingday + day$weekday + day$holiday + day$temp +day$hum +day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$workingday + day$weekday + day$holiday + 
##     day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##    (Intercept)  day$workingday     day$weekday     day$holiday  
##         1017.0          -835.4            22.8          -277.9  
##       day$temp         day$hum   day$windspeed  
##         2144.5          -798.5         -1148.6
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$workingday + day$weekday + day$holiday + 
##     day$temp + day$hum + day$windspeed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1437.96  -222.00   -10.43   163.64  1678.57 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1017.041    104.125   9.768  < 2e-16 ***
## day$workingday  -835.431     34.125 -24.481  < 2e-16 ***
## day$weekday       22.803      7.703   2.960  0.00318 ** 
## day$holiday     -277.935     95.316  -2.916  0.00366 ** 
## day$temp        2144.543     85.334  25.131  < 2e-16 ***
## day$hum         -798.516    111.828  -7.141 2.27e-12 ***
## day$windspeed  -1148.571    206.140  -5.572 3.56e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 414.4 on 724 degrees of freedom
## Multiple R-squared:  0.6387, Adjusted R-squared:  0.6357 
## F-statistic: 213.3 on 6 and 724 DF,  p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                 Df    Sum Sq   Mean Sq F value    Pr(>F)    
## day$workingday   1  92361829  92361829 537.716 < 2.2e-16 ***
## day$weekday      1   2121526   2121526  12.351 0.0004681 ***
## day$holiday      1   1792826   1792826  10.438 0.0012906 ** 
## day$temp         1 111988413 111988413 651.979 < 2.2e-16 ***
## day$hum          1   6202415   6202415  36.109 2.954e-09 ***
## day$windspeed    1   5332509   5332509  31.045 3.557e-08 ***
## Residuals      724 124359304    171767                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)

reg_users = lm(day$registered ~ day$workingday + day$weekday + day$holiday + day$temp +day$hum +day$windspeed)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$workingday + day$weekday + 
##     day$holiday + day$temp + day$hum + day$windspeed)
## 
## Coefficients:
##    (Intercept)  day$workingday     day$weekday     day$holiday  
##        2873.22          907.81           28.87         -221.32  
##       day$temp         day$hum   day$windspeed  
##        4455.64        -2274.75        -3659.70
summary(reg_users)
## 
## Call:
## lm(formula = day$registered ~ day$workingday + day$weekday + 
##     day$holiday + day$temp + day$hum + day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4094.8  -943.0   -32.0   865.3  2874.8 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2873.22     297.77   9.649  < 2e-16 ***
## day$workingday   907.81      97.59   9.302  < 2e-16 ***
## day$weekday       28.87      22.03   1.311    0.190    
## day$holiday     -221.32     272.58  -0.812    0.417    
## day$temp        4455.64     244.04  18.258  < 2e-16 ***
## day$hum        -2274.75     319.80  -7.113 2.73e-12 ***
## day$windspeed  -3659.70     589.51  -6.208 9.03e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1185 on 724 degrees of freedom
## Multiple R-squared:  0.4277, Adjusted R-squared:  0.423 
## F-statistic: 90.18 on 6 and 724 DF,  p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
## 
## Response: day$registered
##                 Df     Sum Sq   Mean Sq  F value    Pr(>F)    
## day$workingday   1  164133237 164133237 116.8411 < 2.2e-16 ***
## day$weekday      1    3845951   3845951   2.7378   0.09843 .  
## day$holiday      1    1451854   1451854   1.0335   0.30967    
## day$temp         1  488776353 488776353 347.9439 < 2.2e-16 ***
## day$hum          1   47722372  47722372  33.9720 8.417e-09 ***
## day$windspeed    1   54138590  54138590  38.5395 9.034e-10 ***
## Residuals      724 1017043617   1404756                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)

Current Final Model

casual_users = lm(day$casual ~ day$workingday + day$temp +day$hum +day$windspeed)
casual_users
## 
## Call:
## lm(formula = day$casual ~ day$workingday + day$temp + day$hum + 
##     day$windspeed)
## 
## Coefficients:
##    (Intercept)  day$workingday        day$temp         day$hum  
##         1063.6          -806.6          2149.5          -812.7  
##  day$windspeed  
##        -1145.3
summary(casual_users)
## 
## Call:
## lm(formula = day$casual ~ day$workingday + day$temp + day$hum + 
##     day$windspeed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1345.19  -217.83   -10.19   170.03  1769.42 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1063.55     101.37  10.492  < 2e-16 ***
## day$workingday  -806.63      33.41 -24.143  < 2e-16 ***
## day$temp        2149.52      86.32  24.901  < 2e-16 ***
## day$hum         -812.74     112.98  -7.194 1.57e-12 ***
## day$windspeed  -1145.31     208.55  -5.492 5.51e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 419.3 on 726 degrees of freedom
## Multiple R-squared:  0.6291, Adjusted R-squared:  0.6271 
## F-statistic: 307.9 on 4 and 726 DF,  p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
## 
## Response: day$casual
##                 Df    Sum Sq   Mean Sq F value    Pr(>F)    
## day$workingday   1  92361829  92361829 525.333 < 2.2e-16 ***
## day$temp         1 112350448 112350448 639.023 < 2.2e-16 ***
## day$hum          1   6501869   6501869  36.981 1.928e-09 ***
## day$windspeed    1   5302333   5302333  30.158 5.510e-08 ***
## Residuals      726 127642343    175816                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(casual_users)

reg_users = lm(day$registered ~ day$workingday + day$temp +day$hum +day$windspeed)
reg_users
## 
## Call:
## lm(formula = day$registered ~ day$workingday + day$temp + day$hum + 
##     day$windspeed)
## 
## Coefficients:
##    (Intercept)  day$workingday        day$temp         day$hum  
##         2945.8           932.4          4460.2         -2294.1  
##  day$windspeed  
##        -3656.4
summary(reg_users)
## 
## Call:
## lm(formula = day$registered ~ day$workingday + day$temp + day$hum + 
##     day$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4079.2  -912.9   -41.4   868.6  2873.4 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2945.82     286.66  10.276  < 2e-16 ***
## day$workingday   932.44      94.48   9.869  < 2e-16 ***
## day$temp        4460.19     244.11  18.271  < 2e-16 ***
## day$hum        -2294.09     319.48  -7.181 1.72e-12 ***
## day$windspeed  -3656.39     589.75  -6.200 9.48e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1186 on 726 degrees of freedom
## Multiple R-squared:  0.4256, Adjusted R-squared:  0.4225 
## F-statistic: 134.5 on 4 and 726 DF,  p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
## 
## Response: day$registered
##                 Df     Sum Sq   Mean Sq F value    Pr(>F)    
## day$workingday   1  164133237 164133237 116.743 < 2.2e-16 ***
## day$temp         1  489324633 489324633 348.043 < 2.2e-16 ***
## day$hum          1   48906300  48906300  34.786 5.641e-09 ***
## day$windspeed    1   54041429  54041429  38.438 9.478e-10 ***
## Residuals      726 1020706374   1405932                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(reg_users)